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Abstract. Stability tests based on the Allan variance method have become a standard procedure for the evaluation 
of the quality of radio-astronomical instrumentation. They are very simple and simulate the situation when 
detecting weak signals buried in large noise fluctuations. For the special conditions during observations an outline 
of the basic properties of the Allan variance is given, and some guidelines how to interpret the results of the 
measurements are presented. Based on a rather simple mathematical treatment clear rules for observations in 
"Position-Switch", "Beam-" or "Frequency-Switch", "On-The-Fly-" and "Raster-Mapping" mode are derived. 
Also, a simple "rule of the thumb" for an estimate of the optimum timing for the observations is found. The 
analysis leads to a conclusive strategy how to plan radio-astronomical observations. Particularly for air- and 
space-borne observatories it is very important to determine, how the extremely precious observing time can be 
used with maximum efficiency. The analysis should help to increase the scientific yield in such cases significantly. 
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1. Introduction 

Allan variance measurements have been demonstrated 
as a useful tool for the characterization of the stabil- 
ity of radio-astronomical equipment such as Millimeter 
or Submillimeter-receivers or large bandwidth back-ends 
( (Bchicdcr ct al. 1985t|Kooi ct al. 200C| ). Particularly for the 
development of acousto-optical spectrometers (AOS) at 
the Kolner Observatorium fur Sub-Millimeter Astronomy 
(KOSMA) the method has played a very important role, 
because it provides clear evidence that the spectrometers 
are well suited for the use at an obs ervatory by mean s of a 
reliable test laboratory procedure ( Tolls et al. 1989 ). The 
simple definition of the Allan variance makes it very easy 
to apply such measurements also for the characterization 
of the stability of other instruments, a very elementary 
case is the definition of the quality of a simple Lock-In 
amplifier for example. 

For a real time spectrometer, as used in radio- 
astronomy with many simultaneously operating frequency 
channels, it is a very important condition that all channels 
are behaving identically in a statistical sense. Therefore, 
the use of the Allan variance for the investigation of the 
performance of the spectrometer is based on the assump- 
tion that there are no differences between different fre- 
quency channels. That this is not always correct is evi- 
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dent. Thus, it is always necessary to verify the similarity 
of all frequency channels of the spectrometer by investi- 
gating the baseline noise of measured spectra for exam- 
ple. Typical problem areas for instance are light scatter 
problems in acousto-optical spectrometers (AOS), where 
speckles may affect individual channels more heavily than 
others. The same is true for filterbanks which have occa- 
sionally same peculiar channels even in a well maintained 
back-end system. But in all normal cases of well behaved 
instrumentation, the Allan variance plot is a most useful 
method to precisely characterize the instrumentation in 
use. 

In general, observations at an observatory are done 
with the available instrumentation as is, and it can not 
be modified or even improved by the observer. On the 
contrary, the observer has to find the correct observing 
parameters in order to use the available hardware in a 
most economic way. It is the purpose of this paper to 
develop a strategy for an optimization of the observing 
process. For this the knowledge of the stability parame- 
ters is decisive. Once this information is available from 
an Allan variance measurement for example, it should 
be a rather straightforward matter to determine the es- 
sential parameters like length of integration per position 
on sky et cetera. The following mathematical treatment 
analyses the commonly used observing methods, i.e. "Po- 
sition-" , "Beam-" or "Frequency-Switch" , "On-The-Fly" 
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(OTF) measurements or "Raster-Mapping" based on the 
information contained in the Allan variance plot. As a re- 
sult practical guidelines for the most efficient observing 
method are found, which can be used at any radio obser- 
vatory. Particularly, all space- or air-borne observatories 
require a most efficient use of the extremely precious ob- 
serving time, since any loss can usually not be compen- 
sated by a simple increase in observatory time. But also 
for ground-based observatories the results found in the 
following should be very useful. 



2. Definition of the Allan variance 

If a test procedure is defined for use at any time and at 
any location, it needs to be as simple and unique as pos- 
sible. Therefore, we understand the Allan variance as the 
ordinary statistical variance of the difference of two con- 
tiguous measurements (see also Rau fc Schieder (1984 )). 
One has to consider a signal-function s(t), which is the 
instantaneous output signal of a spectrometer channel or 
of a continuum detector for example. The output is now 
integrated for a time interval T representing an estimate 
of the mean signal which is stored as spectrometer data 
in the computer: 



historical reasons, since it was already introduced by Allan 
in 1966. Thus we useQ: 



x(T,t) = 1/T I s(t')dt' 

Jt-T 



(1) 



The expectation value of x(T, t) is therefore identi- 
cal with the expectation of s(t). For the observation of 
weak signals, a certain number N of differences of two of 
these data, a "signal-measurement" x s and a "reference- 
measurement" x r , are subtracted from each other: 



(2) 



so that the desired signal alone becomes visible when av- 
eraging. Typically, each of the two measurements are done 
at different times, after the telescope has moved between 
two positions on sky. 

In order to obtain a plausible estimate of the error of 
the difference we use the standard definition of the vari- 
ance: 

a 2 (T) = ((d-(d)) 2 ) = (d 2 )-(d) 2 

The brackets " ()" stand for the expectation value. In com- 
parison, this definition is similar to the original definition 
of the Allan variance ( Allan 196(f ) , if one considers a situa- 
tion, where the expectation value of the difference is zero 
which is practically "normal" during radio-astronomical 
observations: 



a 2 A (T) = l/2(d 2 ). 

For further treatment we use the standard definition 
of the variance, but leave the factor of 1/2 in place for 



a 2 A (T) = l/2((d-(d)) 2 ) = l/2[(d 2 )-(df 



(3) 



Note that with this new definition we consider also 
the possibility that the mean of the difference may not be 
zero. In case there is radiometric noise only, this expression 
defines the noise of a single measurement x s or x r alone 
thanks to the factor of 1/2^] 

If we apply now Eq.(l), we get: 



[a 2 (T) + a 2 (T)}/2 - [a 2 (T)a 2 (T)}^ 2 g sr (T) 



with 

9sr(T) 

a 2 (T) = ((x s - (x s )) 2 ) and a 2 (T) = ((x r - (x r )) 2 ). 



{(X s - (x s ))(x r ~ {X r })} 

l((x s -(x s ))2)((x r -(x r )y}}Vi : 



(4) 



g sr (T) is the normalized cross-correlation function of 
the two data sets x s and x r . It should be understood that 
the expectation values are the means averaged over the 
time t. In other cases it might be the mean of a large 
number of spectrometer pixels for example. Both cases 
should be equivalent for the discussion here. 

If we have the same statistics for both, "s" and "r" 
(of (T) = a 2 (T) = ct 2 (T)), then we get finally: 

a 2 A (T) = a 2 (T)[l^g sr (T)} 

According to this expression the Allan variance is al- 
ways smaller than the normal variance of the data sets as 
long as there is no "anti-correlation" with negative g sr (T). 
The measurement of differences therefore removes all con- 
tributions from the noise which are correlated. This re- 
flects the simple fact that the impact of slow drift noise 
on the signal to noise ratio can be removed by signal mod- 
ulation techniques, as is commonly applied during obser- 
vations in radio-astronomy or when using Lock-In ampli- 
fiers in laboratory experiments. It also tells immediately 
that fast switching does not help whatsoever, if there is 
no correlation as is typical for pure white noise. 

We have not yet made any particular assumption 
about the source of the signal- and the reference-data. 
For our application here, the two data "s" and "r" are de- 
rived from the same output signal s(t) of one spectrometer 

This original definition through the difference of samples 
may be altered by using the ratio of contiguous data instead. 
The corresponding " ratio- variance" is then: cr 2 (T) = 1/2 x 
{[x s /x r - (x 8 /x r )] 2 ). 

In case the rms of the noise is small as compared with the mean 
(s(t)), one can easily show that a 2 (T) = cr\{T) / (s(t)) 2 . This 
new definition has the advantage to properly calibrate the data 
even at varying gain in the system. 

2 In general one has to consider the fact that there is 
only a finite data set available for the calculation of a vari- 
ance. Therefore, instead of Eq.(3), one should use the stan- 
dard definition with <J A (T) = -i^j- X)n=i(^ n — °0 2 with d = 
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channel. The two acquisition periods of length T for the 
integration of x s and x r must therefore occur one after the 
other in order to avoid any undesirable overlap between 
the two measurements. For an unequivocal definition of 
the instrumental Allan variance we assume that all "s" 
and "r" measurements are contiguous without any dead 
time in between. In real life, when observing, there will be 
always some unavoidable dead time, since the telescope 
needs to be moved between the On- and the Off-position 
or there is time needed for data transfer etc. Any delay will 
increase the impact of slow drift noise, and it will there- 
fore result in a different appearance of the system noise. 
Such effects will be discussed in the next chapter. 



3. The role of the minimum 

For a given integration time the signal output of one spec- 
trometer channel is described by Eq.(l). We can describe 
the instantaneous noise signal s(t) before integrating using 
the (in this case not normalized) auto-correlation function 
7(r), but here as a function of delay time r: 



l{T ) = ((s(t + r)-(s(t + r)))(s(t)-(s(t)))). 



(5) 



The integrated signals "a;" have a new auto-correlation 
function IV (t): 

r r (r) = ((x(T,t + r) - (x(T,t + r)))(x(T,t) - (x(T,t)))), 

and we get after some manipulation, when using Eq.(l) 
and (5): 



r T (r) = l/T 2 / T (T-|i|) 7 (t + r)dt. 

J-T 



(6) 



According to the definition of the Allan variance in 
Eq.(3) we have now: 



4(T) = r T (o)-r T (T) 



(7) 



Frequently, instead of the auto-correlation function 
7(r), the noise power spectrum S(f) is used for the de- 
scription of noise. Since the signal s(t) is real valued, one 
can write (see e.g. also in Barnes et al. (1971); Vessot 
BID ): 



l(r) = / S(f) cos(27r fr)df and 
Jo 



S(f) = 4 / 7 (r)cos(2^/r)dr 
Jo 



(8) 



How the correlation function behaves in low order ap- 
proximation for a noise spectrum like S(f) cx 1/ f a is eas- 
ily found using Eq.(8) for sufficiently small r > 0: 



7( T ) = 9c - g a r a 1 for 1 < a < 3, 

= g c — gi log(r) for a = 1 (flicker noise) 

= g c + g a \jT l - a for < a < 1 

= go5(r) for a — (white noise). 



(9) 



The parameters g c , 9ai 9i, 9o describe the actual con- 
tribution to the correlation function. In all cases we have: 
j(—t) = 7(t). According to Eq.(5), 7(0) is identical with 
the expectation value of the square of the signal, which is 
equivalent to the total power contained in the noise fluctu- 
ations, and it has to be finite. Consequently, 1/ f a power 
spectra also have to stay finite at frequencies close to zero, 
at least for a > 1, since the integral over the noise power 
spectrum S(f) for zero r must not diverge for the same 
reason (see Eq.(8) for r — > 0). It means that l/f a spec- 
tra cannot exist at very small /! It is easy to deal with 
the divergence problem by introducing a lower cut-off fre- 



quenc y for spectra where a > 1 (see e.g. Barnes et al 



1971 )). On the other hand, for < a < 1 the power 



spectra must have an upper cut-off frequency because of 
the same arguments. Thus, white noise in this sense has 
to be "band-limited" which is automatically the case in 
any real experiment due to inevitable time constants for 
example. The special case of "flicker noise" (a = 1) re- 
quires both, a lower and an upper cut-off frequency, in 
order to be realistic. Consequently, the formulas (9) are 
valid within limits for r, which are also defined by the 
appropriate cut-off frequencies. Important for the follow- 
ing treatment is that for 1 < a < 3 Eq.(9) is valid also 
for t — ► 0. The range < a < 1 we do not consider 
any further, since these noise power spectra don't seem to 
be observable under normal circumstances, at least with 
standard radio- astronomical equipment. 

In this approximation we have now for the Allan vari- 
ance according to Eq.(6), (7), and (9): 



4(2°- 1 -l) T q-l 

9a a (a+l) 
90/T 



1< a < 3 

a = (white noise) 



(10) 



For a > 1 Eq.(10) is valid for integration times T 
smaller than the characteristic correlation time of the drift 
noise and larger than is determined by the highest fre- 
quency components of the noise. These two assumptions 
apply in all cases considered here. 

If we assume a simple power law for the drift contri- 
bution with a well defined a, and if we consider the addi- 
tional presence of radiometric noise, or "white noise" , we 
expect the Allan variance to have the following structure 
as a function of integration time: 



a 2 A (T) = a/T + bT 13 {j3 ■ 



!)■ 



It is general experience with radio-astronomical as well 
as ordinary laboratory equipment that the slope of the 
drift contribution is found somewhere between (3=1 and 
(3 = 2, which corresponds to l// 2 - and l// 3 -noise respec- 
tively. Good examples of such correlation functions are 
the spontaneous decay of excited molecular states with a 
simple exponential correlation function, or emission from 
a thermal source with a Gaussian correlation function. 
When expanded in lowest order approximation, they re- 
sult in terms with (3=1 and 2 respectively. Chaotic pro- 
cesses will typically lead to power-laws somewhere in be- 
tween. We have never found an indication of the presence 
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Fig. 1. Artificial data set generated by random numbers (left) with white noise of Gaussian distribution (top), drift 
noise (middle), and combined noise (bottom). Each data point corresponds to a sample integrated for 1 second while 
the fluctuation bandwidth was set to 600 kHz. The drift noise is calculated by filtering white noise with a sufficiently 
broad boxcar time- filter (width > T max in the Allan variance plot). To the right the (relative) Allan variance plots 
of all three noise spectra are depicted. The white noise appears with a slope of —1, the drift noise with a slope of 
approximately +1. The combination of both results in a typical Allan plot with a minimum at some fairly well defined 
minimum time. 



of 1/ /-noise in any of our instruments which would con- 
tribute with a horizontal slope in the Allan variance plot. 

Within the white noise part of the Allan plot, i.e. the 
regime with the slope of "—1", the radiometer equation 
must be valid: 



(s(t)) 
B F1 T 



(11) 



Bp\ is the "fluctuation bandwidth" of the spectrome- 
ter of the frequency channel of the spectrometer, which is 
defined as: 



[Jo 00 P(f)df? 



(12) 



(see e.g. |Kraus (1980| ) and references therein). P(f) is 
the power response function of the frequency channel to a 
monochromatic input at frequency /. B-p\ is always larger 
than the resolution-bandwidth 5r os of the channel, so that 
the radiometric noise should be somewhat smaller than 
often is expected. Typically Bp\ is more than 50% larger 
than S Kcs - 

In most practical cases it is very useful to refer to 
the particular integration time in the Allan variance plot 



where the minimum occurs. This minimum describes the 
turn-over point where the radiometric noise with a slope 
of —1 in the logarithmic plot becomes dominated by the 
additional and undesired drift noise (see Fig. 1). Above 
the minimum time the rms of the measurements becomes 
much larger than is anticipated by the radiometer equa- 
tion alone. Intuitively, the minimum time might appear 
as an upper limit for the integration on individual po- 
sitions during radio-astronomical observations, but the 
Allan variance plot offers a lot more detailed advice when 
planning the most efficient observing strategy under the 
given circumstances. Since any additional noise above the 
radiometric level is very unfavorable, one has to find the 
optimum integration time, where the loss due to inevitable 
dead time during slew of the telescope etc. is as little as 
possible, and where the impact of drift contributions is 
nearly negligible at the same time. To find this best com- 
promise is the goal of the following chapters. 

By use of the minimum time Ta of the variance we can 
now rewrite the above equation with: 



(S(t))2 



1 



B n T A 



(l/t + tP/P) with t = T/T A 



(13) 
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In a mathematical sense the minimum time appears 
rather naturally as the decisive parameter for the descrip- 
tion of the plot. It is obvious that at the minimum the 
variance is already significantly larger than the radiomet- 
ric value, for (3 = 1 it is doubled for example. 

The slope of the drift part in the Allan variance plot 
is, as is seen in Fig. 1, also one of the important parame- 
ters for the characterization of the instrument. Therefore, 
we can conclude that the minimum time, the fluctuation 
bandwidth, and the slope at large integration time are 
the three parameters which fully characterize the instru- 
ment in a statistical sense. All three parameters are di- 
rectly accessible from the Allan variance plot once there 
are sufficient data collected for a reliable evaluation. It is 
interesting to note that generally the outcome of an Allan 
variance test looks nearly identical to previous ones as 
long as the instrumentation used for the test is not al- 
tered. This is particularly useful for checking the health 
of an instrument from time to time. Certainly, there are 
other methods to describe the noise performance of a ra- 
diometer like the plot of the noise power spectrum or the 
correlation function or else, but it seems rather natural to 
use the Allan variance plot, since it is directly related to 
the normal observing procedure when observing an "On"- 
and an "Off-position" with a radio-telescope. 

If the fluctuation bandwidth Bp\ is changed the mini- 
mum also shifts due to the changing level of white noise, 
but, despite the change of the leading factor, Eq.(13) is 
not altered due to the normalization of the time with the 
Allan variance minimum time. How the radiometric con- 
tribution is decreasing with increasing fluctuation band- 
width is clear from the radiometer equation. However, the 
drift contribution should not change, since it does not de- 
pend on the shape of the filter-function of the actual spec- 
trometer channel. The minimum therefore shifts to smaller 
times with increasing Bp\ like 

T A = T A (B n /B' m ) 1 ^ +1 ^ (14) 

This formula should help when considering the stability 
of the spectrometer output while co-adding adjacent pix- 
els for example. (The problem, how the fluctuation band- 
width changes when co-adding, is not so easily solved. This 
is discussed in the appendix.) 

Co-adding frequency pixels of a spectrometer output 
is standard practice in radio-astronomy when dealing with 
very broad emission lines e.g. from other galaxies. Thus 
it is not uncommon to finally discuss spectra with an 
effective fluctuation bandwidth of the order of 50 MHz 
by binning several spectrometer channels. A typical mini- 
mum time of a complete radiometer system at an observa- 
tory is somewhere around 30 seconds or so at a resolution 
of 1MHz of the spectrometer. According to Eq.(14) one 
would expect a shift of the minimum time to values some- 
where between 4 and 8 seconds for the bins. A much larger 
bandwidth one has to deal with, when measuring contin- 
uum signals with large bandwidth bolometers. A typical 
effective bandwidth may be of the order of some 50 GHz. 
In this case the minimum of the Allan variance moves to 



values between 0.1 and 0.8 seconds, when assuming the 
origin of the white noise is still just radiometric while the 
drift noise remains as before. It is clear that the integra- 
tion time used for sampling on each position may be a few 
seconds in the first case, but has to be less than 100 msec 
in the second. 



4. Using the information contained in the Allan 
variance plot 

As was mentioned above, the Allan variance plot provides 
information about what to expect in case there are no gaps 
in time between the corresponding measurements "signal" 
(On) and "reference" (Off). This is very close to the stan- 
dard situation during observing, but now the presence of 
dead time has to be included into the discussion. When in- 
vestigating the simple description of the Allan variance as 
a function of integration time from above it seems plausi- 
ble that the plot should also provide all information about 
the impact of drift noise, if there is dead time between the 
two measurements. How to do this is fairly straightfor- 
ward, and, in order to keep things short, we present the 
mathematical treatment only briefly. 

4.1. Position-Switch observations 

Position-Switch measurements with one signal integration 
(On) per reference measurement (Off) are very common 
for the observation of single positions in an extended 
source for example. In other cases Beam-Switch with a 
wobbling secondary mirror or Frequency- Switch measure- 
ments are applied, since these methods seem to be more 
promising for the resulting signal to noise ratio. In terms 
of a more mathematical treatment, all these methods are 
identical, only the typical time scale is different. In prac- 
tice some dead time needs to be included in the observing 
procedure, but both, On- and Off-integration, are assumed 
to be of equal length.^] Following Eq.(l) we have for the 
signal- and the reference- measurement: 

x s (T,t) = 1/T f dt's(t'), 

Jt-T 

x r (T,t) = 1/T [ dt's{t') 

Jt+T d 

when including the delay time Td between the end of 
the On-integration and the begin of the Off-integration. 
For the error estimate of difference of these two measure- 



3 The assumption of equal length is only valid for identical 
noise levels of both measurements x s and x r . If the emission 
from the two positions is very different and not small in com- 
parison to the receiver noise temperature, an equal length of 
the two integrations is no longer a proper choice. This would 
apply when studying emission from the sun for example, but 
in radio-astronomy, it would be an exceptional situation. 
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ments we get now with the help of Eq.(5), (6), and with 
al{T,T d ) = r T (0) - T T {T d + T) (similar as in Eq.(7)): 



*\{T,T d ) 



B F1 T A 



(i/t + i//3/M)), 



(15) 



t = T/Ta, d = T d /T A with 
f{t,d) = t + 3/2d, = 1, and 
f(t,d) = [t + d]\ = 2. 

It is possible to derive suitable expressions for arbitrary 
values of 0, but in the following treatment we concentrate 
on the two extreme cases = 1 and 2 only. af(T,T d ) 
describes now the noise found with one single pair On and 
Off. It is most efficient to move the telescope only every 
second time so that the observing sequence is On-Off/Off- 
On/On- Off ... instead of On-Off/On-Off/On-Off.... (This 
is also true for Beam-Switch measurements!) In this case 
we have for the duration of each complete cycle with one 
On- and one Off-integration: 

T C = 2T + T d 



noise ratio on the other hand. In Fig. 3 the optimum inte- 
gration time at the minimum of the variance is shown for 
both cases (3—1 and 2 as a function of the relative dead 
time d. The preferred relative integration time t is always 
significantly smaller than unity, which leads to the impor- 
tant conclusion that the integration time should always 
be considerably smaller than the Allan variance minimum 
time. With a realistic drift noise contribution (1 < f3 < 2) 
the optimum integration time will be located somewhere 
between the two solid lines in the plot. For the figure, also 
those limits for the integration time have been computed, 
where the rms-noise is increased by less than 1% as com- 
pared to the optimum. The dotted curves indicate these 
limits for both (3, and it is appears that these regions over- 
lap largely. The hatched area in the plot indicates where 
this overlap-region is found. It means that for any realistic 
scenario it is always possible to find an integration time 
with almost perfect noise performance independent on the 
actual drift characteristics of the system. Consequently, 
the precise knowledge of the drift slope (3 is not really 
essential for the optimization procedure. 



Usually, the measurement is repeated several times and 
the result is co-added to improve the signal to noise ratio. 
Then we have K such pairs, which are measured within 
a total observing time Tobs- We get therefore for a given 
observing time Tobs- 

Tbbs = KT C . 

Since the variance should develop like 1/K, we have 
finally for the variance of the complete observation on one 
On-position.^] 



vUT,T d ) 



1/K 



<?KT,T d ) 



At)) 2 

(i/t+f(t,d)/m- 



(16) 



d/2) 



Any realistic drift scenario can be described by this 
formula, and the result must be located within the range 
of the two limiting values of (3. For a useful calculation it is 
now mandatory that the information about the minimum 
time Ta is known from an Allan variance measurement. 

Fig. 2 shows the shape of Eq.(16) as a function of the 
relative integration time t for a few values of d. For each 
d > the function has exactly one fairly broad minimum, 
and it is plausible that only in this minimum the obser- 
vation can be done with maximum efficiency. Any other t 
leads to a higher noise level, i.e. to lower efficiency within 
a given observing time. This can be explained by the facts 
that with very short integration a lot of time is wasted 
while moving the telescope, and that at very long integra- 
tion time the drift noise starts to deteriorate the signal to 



4 At long total observing time the reduction of the variance 
like 1/K can be proven for any realistic noise power spectrum 
when using the fact that the noise correlation function must 
stay finite for r — * (see also above). 
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Fig. 2. The development of the rms of Position-Switch 
measurements as a function of integration time for a drift 
slope of = 1 in the Allan variance plot (see Eq.(16)). 
The curves are calculated for several delay times between 
On- and Off-position (d = 0,...,0.25). The dotted curve 
connects all minima of the curves and represents the op- 
timum integration time for all delays. The values of the 
delay time d as well as of the integration time are given 
in units of the Allan variance minimum time. 

As was mentioned before, with a standard low reso- 
lution spectrometer one typically finds an Allan variance 
minimum of a complete radiometer system in the range 
of 30 seconds or so. Chopped measurements, using a wob- 
bling secondary telescope mirror for example, are consid- 
ered as the ideal method for point-like sources to reduce 
the impact of drift noise on the appearance of the base- 
lines of the spectra. If the chop delay, i.e. the time to move 
the subreflector between the two positions, needs 100msec 
for example, the optimum integration time per position 
is found near 4 seconds following Eq.(16). The situation 
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Fig. 3. Optimum integration time as a function of On-Off 
delay for the two extreme drift contributions with (3 = 1 
and j3 = 2 as found from Eq.(16). The dotted curves rep- 
resent the intervals where the rms is increased by 1% max- 
imum for both values of (3. The hatched area defines the 
regime where the rms increase is less than 1% independent 
on the actual value of f3. In this area the preferred choice 
of the integration time is found. The values of the delay 
time d as well as of the integration time are given in units 
of the Allan variance minimum time. 

seems to be different for the case d = 0, as it would apply 
for Frequency- Switch measurements for example, since the 
switch between the two nearby frequencies takes negligible 
time. But, as is visible in Fig. 2, the increase in rms noise 
is fairly marginal (< 1%) even for integration times T up 
to 14% of Ta- This means, in all practical cases it is of no 
use to switch at high speed, on the contrary, the efficiency 
of the observation might become affected, if dead time 
is involved. Even for spectra at moderately reduced fre- 
quency resolution the required integration time does not 
drop significantly below 1 second. It is therefore important 
to note, that a higher chop frequency is only required for 
continuum measurements with very large bandwidth. 

The ideal, theoretical limit for the observing efficiency 
is reached, when effectively all integration time is spent 
on the On-position and if there would be no dead time 
involved. In this case we have: 



'th 



(T) 



1 



-BFi?bbs 



The best possible efficiency relative to this theoretical 
performance is therefore: 



rj = [al(T)/a 2 K (T,T d )}V 2 

= l/2[(l/to + l//3/(to,d))(to + d/2)] 



(17) 



-1/2 



with to = Tq/Ta the optimum integration time for the 
given delay. This observing efficiency r\ is always smaller 
than 50%, since at least half of the time is "wasted" for 
the integration of the Off-signal. Clearly, the longer the 
dead time the less efficient the observation. Since the im- 
pact of the dead time is determined by its relative length 



when comparing with the Allan variance minimum time, 
a larger Ta helps as well. (A plot of Eq.(17) can be found 
in Fig. 5.) It should be kept in mind that the efficiency 
calculated here is the best possible for a given d. If other 
integration times are chosen, the efficiency will definitely 
become worse! One should also be aware of the fact that 
the total observing time has to be increased by a factor 
proportional to the square of the inverse efficiency to com- 
pensate for the reduced efficiency, which might become a 
high price to pay for a non-appropriate observing strategy. 

4.2. Mapping 

Another and possibly more interesting case is the situ- 
ation when measuring maps either by Raster-Mapping 
or On-The-Fly. In both cases there are N different On- 
positions per Off-position in one cycle, the only difference 
is that for Raster-Mapping there is some dead time be- 
tween the different On-positions which does not appear 
during OTF observations. It is found in literature that 
the Off-integration time should be V7V times longer than 
the On-integration time (Ball 1976). This advice leaves the 
question open how long the On-integration should last. For 
the following treatment of this question we assume that we 
have an On-integration time T s , an Off- integration time 
T r , a dead time T ds between each of the On-measurements, 
another dead time T dr to move from the last On- to the 
Off-position, and a different dead time T dc to move the 
telescope back to the first On-position to begin with the 
next cycle again. It is plausible that T dc will not be iden- 
tical with T dr , since the first and last On-position are 
not the same, and the time to move between the posi- 
tions (with different velocity requirements in OTF-mode 
as well) is definitely different. 

The delay between one of the On-positions and the 
Off-position is also dependent on the number of Ons in 
between. If we consider the worst case situation, we have 
to investigate the On-Off pairs with maximum delay in- 
volved, which is the first On-position when putting the 
Off at the end of the cycle. The delay T d is then: 

T d = (N-l)(T s +T ds )+T dr or d = {N-l)(s+d s )+d r (m) 

Here and for the following we use d = T d /T Al d s = 
T ds /T A , d r = T dr /T A , d c = T dc /T Al s = T S /T A and r = 
T r /T A . 

We also have to take into account now that the inte- 
gration time for On is different than for Off. Hence we 
write: 

x s (T s ,t) = 1/T S f s(t')dt', (19) 

Jt-T s 

rt+T d +T r 

x r (T r ,t) = l/T r I s(t')dt' 

Jt+T d 

Similar as before we find after some straight-forward 
derivation using Eq.(5),(6),(9), and (19): 

af(s,r) = r Tr (0) +r Ts (0) - [w+T T+ (T m ) - w_T T _(T m )] 
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with T± = (T P ± r,)/2, iu± = 2T|/[T r T s ], and T m = 
T + + T f j. When integrating one finds now: 



(*(*)> a 



1 



BfiTa 



(l/s + l/r + 2 5 (s,r,d)//3) 



(20) 



and for the two limiting cases of (3 one gets: 
g(s,r,d) = (s + r)/2 + 3/2d, /? = 1, and 
g{s,r,d) = [{s + r)/2 + df,0 = 2. 

The function g(s,r,d) is identical with f(t,d) for s = 
r = t (see Eq.(15)). The variance found here is valid for 
one pair of a particular On- and the corresponding Off- 
measurement. 

We have to identify now, how the noise is developing, 
if one wants to observe a full map within a given total 
observing time Tobs- One observing cycle consists of N 
identical On- integrations (T s ), one Off-integration (T r ), 
and the various dead times in between. Thus we have for 
the complete cycle time T c : 



T c = NT S +T r + (N- l)T ds + T dr + T, 



dc 



(21) 



We assume that we want to measure a map consisting 
of L different On-positions. This needs L/N cycles for ob- 
serving each position once. Each of the On-positions may 
be measured K times within the total observing time Tobs 
in order to improve the noise level. Thus we have: 



Ti 



Obs 



KT C x L/N. 



(22) 



with K > 1. The choice of K may be dependent on TV, L 
and the available total observing time Ibbs, and it has to 
be chosen according to the individual needs of the observ- 
ing program. In many cases K will be equal to 1. When 
using Eq.(20), (21), and (22), we get now finally: 



a%(s,r,N) =1 , K °i{s,r,N) 



= L- 



1 



At)) 2 

{l/s + l/r + 2/(3g(s,r, d)) 



(23) 



x (s + d s + 



r + d r 



N 



: ) 



We have found now the variance as a function of three 
variables s, r, and N with the relative delays d s , d ri and 
d c as parameters. Note that the On-Off delay Tdr has 
different impact on the statistics than the return delay 
Tdc, since the latter does not affect the drift contribution 
g(s,r,d) (see Eq.(18), (20), (21), and (23)). 

The minimum of a'j ( (s,r,N) can be found, where all 
derivatives with respect to s, r, and N become zero. This 
is the set of variables where the observing efficiency be- 
comes the best possible under the given circumstances. (It 
is simple to prove that there is exactly one minimum as 
long as s, r and N are larger than zero.) Any other set 
of variables will result in a degradation of the observing 
efficiency. But, as was mentioned before, the use of the 



relation r = sy A" leads to results very close to this opti- 
mum]^ Therefore, for all practical purposes it is sufficient 
to apply only a two-dimensional optimization for the two 
variables s and N: 



da 2 K (s,r,N)/ds\ r=sVW = and 
da 2 K (s,r,N)/dN\ r=sVN = 



(24) 



It is trivial to show that the optimum number of Ons 
becomes infinite in case of OTF measurements (d s = 0). 
Therefore it seems to be advisable to use fairly large N 
in order to be as close as possible to the optimum case 
of A" — > oo. On the other hand, the optimum integration 
time t s becomes extremely small in this case (see below), 
which finds it's limitation because of hardware constraints 
for example. Surprisingly, for Raster-Mapping with d s =/= 
there is always a finite A^ required for an optimized ob- 
servation. This optimum A" is dependent on d s , d r , and 
d c - 

Usually, it is rather difficult to make observations with 
an arbitrary number of Ons per Off at a given geometry of 
a particular map. It is therefore much more interesting to 
derive conclusive estimates for an optimized observation 
under the assumption of a predefined and fixed A" for both, 
Raster-Mapping and OTF observations. In this case one 
has to find the minimum with: 



da 2 K (s,r,N)/ds\ Nfixcdr=sVW 







(25) 



In any case one has to investigate what impact the 
chosen A^ has on the total efficiency using Eq.(23) and 
(24) in order to verify that the used A^ is not too far away 
from optimum. 

In order to provide some idea about the best choice 
of the On-observing time s, the optimum integration time 
in OTF mode is shown in Fig. 4 as a function of the On- 
Off delay d r . The delay for the return to the begin of the 
cycle is taken into account by a d c 20% longer than d r . 
The two solid curves are derived from Eq.(23) and (20) 
for the two limiting cases (3 = 1 and (3 = 2. The hatched 
area in the plot represents the region where the increase 
of the rms stays below 1% as compared to the optimum 
for both values of (3. This means that for all assumed drift 
slopes one is always safe when choosing an On-integration 
time within this region. Such optimized integration time 
can be described by the purely empirical formula: 

s w 0.53d - 23 /7V°' 69 , r = sVN (26) 

with d= (N - l)d s + d r + d c . 

Using Eq.(23) it is easy to verify this relation when assum- 
ing that there is no drift contribution involved. But, if there is 
drift noise, it is also clear from Eq.(23) that the relation is no 
longer valid. However, a comparison of the results of a calcula- 
tion with and without the relation between On- and Off-time 
shows that the minimum rms-values differ only by amounts 
of the order of 0.1% or less. Therefore the introduction of the 
simple relation between s and r remains justified. 
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d is the sum of all delays in one cycle. The formula is 
also valid for Raster-Mapping and Position- Switch mea- 
surements, and it may be used for values of d r and d c 
between and 1, for d, < 0.1, and N > 1. 



E- .04 - 

G 
O 



N = 50 

d s = 

d„ = 1.2 « d,- 



be reconsidered when planning the best strategy for the 
observation. 



1, 




d s = 

d c = 1.2* d r 



On- Off Delay 



- ► On- Off Delay 

.2 .4 .6 .8 1 

Fig. 4. Optimum On-integration time for OTF measure- 
ments with 50 Ons per Off. The hatched area represents 
the regime where the rms increase stays below 1% for any 
(3 between 1 and 2. The dotted curve in the middle rep- 
resents the suggested On-integration time using Eq.(26). 
As is clearly visible, the optimum integration time is typi- 
cally of the order of a few seconds when assuming an Allan 
variance minimum time near or above 100 seconds. 

Finally, also the overall observing efficiency can be 
found for the measurement of extended maps. The the- 
oretically best possible value of the variance is given by: 



(s,r,N) 



1 



my 



FUObs 



where no dead time is present and virtually all observing 
time is spent on the On-positions. In this case we have 
now for the relative efficiency: 



r,= [ a * h (8,r,N)/o» K (8,r,N)]V* 

= (l/s + l/r + 2/(3g(s,r,d)) 

. , r + d r + d r 
x(s + d s -\ 



(27) 



N 



'-) 



-1/2 



Fig. 5 depicts the optimum efficiency according to 
Eq.(27) and (25) for three different N (N = 1, 10, and 
100). The curves for N = 1 (dotted lines) are the Position- 
Switch efficiencies at the same time (see Eq.(17)). Clearly 
the OTF efficiency is much better than the Position- 
Switch efficiency. At zero delay it reaches a maximum 
value of (1 + l/V^O -1 , and it decreases monotonically 
with increasing d r . Again, the efficiency shown in the plot 
is the maximum one can achieve under the given circum- 
stances. When comparing rj(N = 10) with rj(N = 100), it 
is clear that N — 100 is the preferable choice. This exam- 
ple demonstrates that it is advisable to determine whether 
the number of desired N is a reasonable choice or should 



Fig. 5. Relative optimum efficiencies of OTF measure- 
ments for iV=l, 10, and 100 On-positions per Off (see 
Eq.(27)). For each N both curves for ft = 1 and = 2 
are plotted. It is obvious that larger N lead to higher 
efficiency The dotted curves for N = 1 represent the 
Position-Switch situation with an On-Off delay every sec- 
ond time only. This is taken into account by setting 
d c = d s = in Eq.(23) and (27) while N = 1. 



How the efficiency develops with TV is visible in Fig. 6 
for some fixed On-Off delays. Obviously the gain in effi- 
ciency with increasing N above N = 50 is rather marginal. 
Therefore it is questionable whether a significant improve- 
ment in observing efficiency is achievable when going from 
N = 50 to N = 100 for example. Any reduction of the On- 
Off delay time would be a much more effective measure. 
On the other hand, the plot shows also, how valuable an 
increase in N can be in case one is considering N = 10 or 
less. 




Fig. 6. Relative OTF efficiency as a function of the num- 
ber of Ons per Off for various relative On-Off delays ac- 
cording to Eq. (27), (25), and (23). For each d r both curves 
for (3 = 1 and = 2 are plotted. 
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One of the remaining questions is, how long one cycle 
T c will last, once the optimum On- and Off- integration 
time has been found. Using Eq.(21) it is now simple to 
calculate T c as a function of the On-Off delay time d r . In 
Fig. 7 the cycle time is plotted for three cases with N = 
1, 10, and 100. At first sight it appears surprising that 
the time for a full cycle increases to values several times 
longer than the Allan variance minimum time in case there 
is substantial delay d r . But again, the length of one cycle 
depends strongly on the number of Ons per Off. Since the 
On-integration time is rather small at large N, the larger 
radiometric noise of the On-measurement dominates the 
noise budget so that a longer delay with an increased con- 
tribution of drift noise becomes acceptable. For a given 
and fixed N the increase of the cycle time with increasing 
delay is the consequence of the fact that at larger integra- 
tion time the loss due to drift noise is less costly than the 
loss due to the On-Off delay. This effect is also clearly visi- 
ble in Fig. 2 for the case of Position- Switch measurements. 
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done in a most economical way by moving the telescope 
or the chopper only every second time. OTF or Raster- 
Mapping measurements need a clear understanding of the 
impact of the number of On-positions chosen for each Off- 
integration. Also here it might be of some value to reverse 
the sequence of the integrations on the various positions 
every second time in order to reduce some of the loss in 
time due to the slew of the telescope between the On- and 
the Off-positions. It should be noted that the measure- 
ment of large maps can be handled in different ways. If 
one wants to achieve a certain signal to noise, it might be 
advisable to use larger N with smaller T s and to repeat 
the map several times, as it is considered by the param- 
eter K in Eq.(22). In any case, the suggested On- and 
Off-integration time should not be drastically altered, al- 
though the plot in Fig. 4 indicates that there is quite some 
margin available. 

In general it is surprising how closely together the 
curves for the different (3 in Figs. 5, 6, and 7 are found, 
which is a clear validation for the assumption that it is 
sufficient to consider only the extreme cases for the drift 
contributions. Therefore, there is no need to go too deeply 
into the analysis of the drift part in the noise. It is also 
one of the better news from the treatment here that some 
freedom to plan the observation is still preserved. This 
might be particularly important when considering the con- 
straints set by the observatory hardware. It is probably 
not advisable to operate with too short integration inter- 
vals, since the data flood might become overwhelming, and 
the storage capacity of the computers could easily be ex- 
ceeded. Therefore, the conclusion found before that there 
are no real requirements for high speed observing most of 
the time is very important. 

The discussion above is most useful for observations 



Cycle time for OTF measurements as a func- 



Fig. 7 

tion oi l On-Off delay. The cycle time comprises N On- 
integrations, one Off-integration, and the dead times in 
between. The three cases (iV=l, 10, and 100) are calcu- 
lated from Eq.(21), (23), and (25). Similar to Fig.5, the 
Position- Switch situation is also indicated by the dotted 
lines. Note that the increase of cycle time is partly due to 
the time spent during slew from On to Off and back. 



5. Conclusion 

The discussion above provides some clear guidelines for 
an optimized observing program. The first step has to be 
a reliable measurement of the system Allan variance. The 
word "system" includes all components of the observa- 
tory which may possibly contribute to the noise including 
the atmospheric fluctuations for example. When knowing 
the applicable dead times, a simple calculation of the op- 
timum integration time can be made by using the "rule 
of the thumb" as given by Eq.(26). As was pointed out 
before, Position-Switch or Chop measurements should be 



with space-born observatories like SWAS ( Melnick et al 
2000D, OD IN ( [Hjalmarson 1993[ ) or FIRST ( ge Graauw 
ct al. 1998| ) . f\ Since usually a satellite cannot be oriented 



in space very rapidly, the impact of dead time becomes vi- 
tal. The SWAS satellite is not capable to control the point- 
ing very accurately during slew across an extended source, 
so that the OTF mode is not applicable. Instead, Raster- 
Mapping is a generally used procedure. On the other hand, 
since SWAS is a very small satellite, it can be pointed from 
one position to a second in 3 degrees distance within less 
than 15 seconds. A 3-degree nod is often required dur- 
ing observations in the Milky Way, since the emission of 
molecules like CO is fairly extended. Nevertheless, the 
loss in observing efficiency looks acceptable, when con- 
sidering an Allan variance minimum time of the SWAS 
receiver/backend system of about 150 seconds as found in 
orbit. On the Herschel space observatory, the situation will 
be changed drastically. We can assume that the pointing 
of the telescope during slew is well defined so that OTF 
measurements should be applicable. But, due to the fact 
that Herschel is going to be a very heavy satellite, the 



FIRST was recently renamed to "Herschel Space 
Observatory" . 
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movement by three degrees will last nearly as long as the 
expected Allan variance minimum time will amount to. 
In consequence, the value of the dead times d r and d c 
will be close to unity when assuming a similar system sta- 
bility like that of SWAS. This prohibits Position-Switch 
measurements with the instrument, because the efficiency 
would drop to values below 30%, which would certainly be 
rather disappointing because of the consequences for the 
extremely precious and limited observing time. Therefore, 
a very careful analysis for determining the best possible 
observing strategy is extremely important for such a pro- 
gram. 

Rather different circumstances exist at ground-based 
observatories. Typical dead time for a slew of 3 degrees is 
of the order of a few seconds only, therefore the impact of 
dead time does not appear as devastating as with space- 
based observatories. A detailed planning of an observing 
strategy does not seem to be so easily implemented, par- 
ticularly, if other parameters like varying hardware con- 
straints or human limitations are playing a significant role 
as well. Typically, the Allan variance minimum time of 
most ground-based sub-millimeter observatories is rather 
small, partly due to the impact of an unstable atmosphere. 
Therefore, the advantage of a smaller dead time is partly 
eaten away by the reduced stability. But still, as should be 
clear from the discussion before, the actual situation has 
to be analyzed in detail for every individual case in order 
to achieve as much scientific return from the observations 
as possible. For this the usage of the analysis presented in 
this paper could be very essential. 

Appendix A: The development of noise when 
co-adding frequency pixels 

Co-adding a couple of pixels in a measured spectrum in or- 
der to improve the signal to noise ratio is general practice 
when dealing with noisy spectra, but, the consequences of 
this procedure are not quite as trivial as one would like to 
believe. For the discussion we start again with the defini- 
tion of the normalized first order correlation function as 
defined in Eq.(4): 

9m = {dy n dy n+m )/[(dy 2 n )(dy 2 n+m )] 1/2 

with dy n = y n - (y). 

The data y n are here the pixel components of a fully 
calibrated spectrum as measured with a multi-channel 
spectrometer. The index "m" describes, by how many pix- 
els the spectrum is shifted before the multiplication of the 
pixel data is done.[] The correlation function is symmetric, 

7 In case of a finite data set with -/V data we can convert the 
definition into a more practical definition using: 

_ N-m-i En=i Sy n 8y n+rn 

9 m ~ / N 0.5 

/ i y m fo 2 1 V m fiv 2 1 

I N—m-1 Z->n=l » n JV-m-1 Z->n = l u l>n + mJ 

with 5y„ = y n — 1/(N — m) J2k=-T V k and ^Vn+m = Vn+m ~ 
1/(N — m) y\ m yk+m- The expectation values are estimated 



since g- m = g m - We assume that all y n behave identically 
in a purely statistical sense. Then, the values of g m de- 
pend only on the "distance" between the data given by 
the parameter "m" , and the expectation values as defined 
by the brackets become independent on n. We have to de- 
termine now the expected statistics of the new co-added 
data set z n with: 

K 
fc=l 

with K the number of co-added pixels. With the usual 
definition of the variance, a 2 K = (z^) — (z n } 2 , we can now 
determine how the error of the new data develops: 

- {[i/K^yn+k] 2 ) (i/Kj2yn +k ) 2 

= {[l/KY J dy n+ k] 2 ) 

K K 

= l/K 2 ^2^2(dy n+p dy n+q ) 

p=l g= l 

K K K-l 

= af/K 2 ]T Y, 9 P - q = vl/K[l +2^(1- m/K)g m ] 

p—l q—l m—1 

of is the variance of the statistical distribution of the 
initial data y n . From this and the radiometer equation we 
get now finally: 

ajc = (z) 2 /[B K T\ = al/K Box = (y) 2 /[K^BjT] 

with i^Box = K/(l+2(l-l/K)gi+2(l-2/K)g 2 +...). The 
new fluctuation bandwidth Bx is therefore K^ ox times 
larger than the fluctuation bandwidth Bi of a single spec- 
trometer pixel. But, the effective number of pixels K^ox 
is significantly smaller than the number of co-added pix- 
els since the values of the auto-correlation function are all 
positive under normal circumstances. Note that the ratio 
of K and i^Box is a function of K itself so that one has to 
analyze the situation for the individual case accordingly. 

Only the first few values of g m (to not larger than 
about 3) should be non-zero for a decent spectrometer, 
since the overlap of the power response functions between 
neighbored pixels should be small. Therefore, in the lim- 
iting case of very large width of the bins (K large) , we get 
now: 

i^Box ps K/(l + 2. 9l + 2.9 2 + 2g 3 ) 

Typical values for Kbox at large K - for instance at 
Nyquist sampling of the spectrum - are somewhere near 
K/2 depending on the actual spacing and shape of the 
spectrometer channels, but they may vary for different 
spectrometer types. 

here by the means over a sufficiently large number of data 
(= TV — m). Important is to note that the value of this auto- 
correlation function is "1" for m = by definition. 
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